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Abstract — This paper addresses the problem of distributed 
coding of images whose correlation is driven by the motion 
of objects or positioning of the vision sensors. It concentrates 
on the problem where images are encoded with compressed 
linear measurements. We propose a geometry-based correlation 
model in order to describe the common information in pairs of 
images. We assume that the constitutive components of natural 
images can be captured by visual features that undergo local 
transformations (e.g., translation) in different images. We first 
identify prominent visual features by computing a sparse ap- 
proximation of a reference image with a dictionary of geometric 
basis functions. We then pose a regularized optimization problem 
to estimate the corresponding features in correlated images given 
by quantized linear measurements. The estimated features have 
to comply with the compressed information and to represent 
consistent transformation between images. The correlation model 
is given by the relative geometric transformations between cor- 
responding features. We then propose an efficient joint decoding 
algorithm that estimates the compressed images such that they 
stay consistent with both the quantized measurements and the 
correlation model. Experimental results show that the proposed 
algorithm effectively estimates the correlation between images in 
multi-view datasets. In addition, the proposed algorithm provides 
effective decoding performance that compares advantageously 
to independent coding solutions as well as state-of-the-art dis- 
tributed coding schemes based on disparity learning. 

Index Terms — Random projections, sparse approximations, 
correlation estimation, geometric transformations, quantization. 

I. Introduction 

IN recent years, vision sensor networks have been gaining 
an ever increasing popularity enforced by the availabihty 
of cheap semiconductor components. These networks typically 
produce highly redundant information so that an efficient 
estimation of the correlation between images becomes pri- 
mordial for effective coding, transmission and storage appli- 
cations. The distributed coding paradigm becomes particularly 
attractive in such settings; it permits to efficiently exploit the 
correlation between images with low encoding complexity 
and minimal inter-sensor communication, which translates into 
power savings in sensor networks. One of the most important 
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challenging tasks however resides in the proper modeling and 
estimation of the correlation between images. 

In this paper, we consider the problem of finding an efficient 
distributed representation for correlated images, where the 
common objects are displaced due to the viewpoint changes 
or motion in dynamic scenes. In particular, we are interested 
in a scenario where the images are given under the form of 
few quantized linear measurements computed by very simple 
sensors. Even with such a simple acquisition stage, the images 
can be reconstructed under the condition that they have a 
sparse representation in particular basis (e.g., DCT, wavelet) 
that is sufficiently different from the sensing matrices |3l, 
f7|. Rather than independent image reconstruction, we are 
however interested in the joint reconstruction of the images 
and in particular the estimation of their correlation from the 
compressed measurements. In contrary to most distributed 
compressive schemes in the literature, we want to estimate 
the correlation prior to image reconstruction for improved 
robustness at low coding rates. 

We propose to model the correlation between images as 
geometric transformations of visual features, which provides 
a more efficient representation than block-based translational 
models that are commonly used in state-of-the-art coding 
solutions. We first compute the most prominent visual fea- 
tures in a reference image through a sparse approximation 
with geometric functions drawn from a parametric dictionary. 
Then, we formulate a regularized optimization problem whose 
objective is to identify in the compressed images the features 
that correspond to the prominent components in the reference 
images. Correspondences then define relative transformations 
between images that form the geometric correlation model. A 
regularization constraint ensures that the estimated correlation 
is consistent and corresponds to the actual motion of visual 
objects. We then use the estimated correlation in a new joint 
decoding algorithm that approximates the multiple images. 
The joint decoding is cast as an optimization problem that 
warps the reference image according to the transformation 
described in the correlation information, while enforcing the 
decoded images to be consistent with the quantized measure- 
ments. We finally propose an extension of our algorithm to 
the joint decoding of multi-view images. 

While our novel framework could find applications in 
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several problems such as distributed video coding or multi- 
view imaging, we focus on the latter for illustrating the joint 
decoding performance. We show by experiments that the pro- 
posed algorithm computes a good estimation of the correlation 
between multi-view images. In particular, the results confirm 
that dictionaries based on geometric basis functions permit to 
capture the correlation more efficiently than a dictionary built 
on patches or blocks from the reference image |5|. In addition, 
we show that the estimated correlation model can be used 
to decode the compressed image by disparity compensation. 
Such a decoding strategy permits to outperform independent 
coding solutions based on JPEG 2000 and state-of-the-art 
distributed coding schemes based on disparity learning (61, JTl 
in terms of rate-distortion (RD) performance due to accurate 
correlation estimation. Finally, the experiments outline that 
enforcing consistency in image prediction is very effective in 
increasing the decoding quality when the images are given by 
the quantized linear measurements. 

The rest of this paper is organized as follows. Section 
nil briefly overviews the related work with a emphasis on 
reconstruction from random projections and distributed coding 
algorithms. The geometric correlation model used in our 
framework is presented in Section [IIIl Section |lVl describes 
the proposed regularized energy model for an image pair and 
the optimization algorithm. The consistent image prediction 
algorithm is described in Section (V] Section |Vl] describes 
the extension of our scheme to multi-view images. Finally, 
experimental results are presented in Section IVIII and Section 
IVIIII concludes this paper. 

II. Related work 

We present in this section a brief overview of the related 
works in distributed image coding, where we mostly focus 
on simple sensing solutions based on linear measurements. 
In recent years, signal acquisition based on random projec- 
tions has actually received a significant attention in many 
applications like medical imaging, compressive imaging or 
sensor networks. Donoho 1 3 1 and Candes et al. [4] have shown 
that a small number of linear measurements contain enough 
information to reconstruct a signal, as long as it has sparse 
representation in a basis that is incoherent with the sensing 
matrix [8]. Rauhut et al. |9| extend the concept of signal 
reconstruction from linear measurements using redundant dic- 
tionaries. Signal reconstruction from linear measurements has 
been applied to different applications such as image acquisition 
M, Gil, Ca and video representation d, d, (B]. 

At the same time, the key in effective distributed representa- 
tion certainly lies in the definition of good correlation models. 
Duarte et al. |[T6ll . |[T7] have proposed different correlation 
models for the distributed compression of correlated signals 
from linear measurements. In particular, they introduce three 
joint sparsity models (JSM) in order to exploit the inter- signal 
correlation in the joint reconstruction. These three sparse 
models are respectively described by (i) JSM-1, where the 
signals share a common sparse support plus a sparse innova- 
tion part specific to each signal; (ii) JSM-2, where the signals 
share a common sparse support with different coefficients, and 



(iii) JSM- 3 with a non- sparse common signal with individual 
sparse innovation in each signal. These correlation models 
permit a joint reconstruction with a reduced sampling rate or 
equivalently a smaller number of measurements compared to 
independent reconstruction for the same decoding quality. The 
sparsity models developed in |16| have then been applied to 
distributed video coding ifTSll . |[T9l with random projections. 
The scheme in [Tsl used a modified gradient projection 
sparse algorithm |20| for the joint signal reconstruction. The 
authors in |[T9l have proposed a distributed compressive video 
coding scheme based on the sparse recovery with decoder 
side information. In particular, the prediction error between 
the original and side information frames is assumed to be 
sparse in a particular orthonormal basis (e.g., wavelet basis). 
Another distributed video coding scheme has been proposed 
in 0, which relies on an inter- frame sparsity model. A block 
of pixels in a frame is assumed to be sparsely represented by 
linear combination of the neighboring blocks from the decoded 
key frames. In particular, an adaptive block-based dictionary 
is constructed from the previously decoded key frames and 
eventually used for signal reconstruction. Finally, iterative 
projection methods are used in 1211 . ll22ll in order to ensure 
a joint reconstruction of correlated images that are sparse in 
a dual tree wavelet transform basis and at the same time 
consistent with the linear measurements in multi-view settings. 
In general, state-of-the-art distributed compressive schemes 
i fTSll . |[T9l, im. II22II estimates the correlation model from two 
reconstructed reference images, where the reference frames 
are reconstructed from the respective linear measurements by 
solving an /2-TV or l2-l\ optimization problem. Unfortunately, 
reconstructing the reference images based on solving an l2-l\ 
or /2-TV optimization problem is computationally expensive 
0, 0|. Also, the correlation model estimated from highly 
compressed reference images usually fails to capture the actual 
geometrical relationship between images. Motivated by these 
issues, we estimate in this paper a robust correlation model 
directly from the highly compressed linear measurements 
using a reference image, without explicitly reconstructing the 
compressed images. 

In multi-view imaging or distributed video coding, the cor- 
relation is explained by the motion of objects or the change of 
viewpoint. Block-based translation models that are commonly 
used for correlation estimation fail to efficiently capture the 
geometry of objects. This results in poor correlation model 
especially with low resolution images. Furthermore, most of 
the above mentioned schemes (except |5|) assume that the 
signal is sparse in a particular orthonormal basis (e.g., DCT or 
Wavelet). This is also the case of the JSM models described 
above which cannot be used to relate the scene objects by 
means of a local transform, and unfortunately fail to provide 
an efficient joint representation of correlated images at low bit 
rates. It is more generic to assume the signals to be sparse in 
a redundant dictionary which allows greater flexibility in the 
design of the representation vectors. The most prominent ge- 
ometric components in the images can be captured efficiently 
by dictionary functions. The correlation can be then estimated 
by comparing the most prominent features in different im- 
ages. Few works have been reported in the literature for the 
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Fig. 1. Schematic representation of the proposed scheme. The images /i and I2 are correlated through displacement of scene objects due to viewpoint change. 



estimation of a correlation model using redundant structured 
dictionaries in multi-view |23| or video applications [24J. 
However, these frameworks do not construct the correlation 
model from the linear measurements. In general, most of the 
schemes in classical disparity and motion estimation focus on 
estimating correlation from original images 1251, 1261. and not 
from compressed images. We rather focus here on estimating 
the correlation from compressed images where the image 
is given with random linear measurements. The correlation 
model is built using the geometric transformations captured by 
a structured dictionary which leads to an effective estimation 
of the geometric correlation between images. 

Finally, the distributed schemes in the literature that are 
based on compressed measurements usually fail to estimate 
the actual number of bits for the image sequence represen- 
tation (except 1 5 1), and hence cannot be applied directly in 
practical coding applications. Quantization and entropy coding 
of the measurements is actually an open research problem 
due to the two following reasons: (i) the reconstructed signal 
from quantized measurements does not necessarily satisfy 
the consistent reconstruction property |27|; (ii) the entropy 
of the measurements is usually large which leads to unsat- 
isfactory coding performance in imaging applications |28|. 
Hence, it is essential to adapt the quantization techniques 
and reconstruction algorithms in order to reduce the distortion 
in the reconstructed signal such as |29|, |30|. The authors 
in 1311 . I32I have also studied the asymptotic reconstruction 
performance of the signal under uniform and non-uniform 
quantization schemes. They have shown that a non-uniform 
quantization scheme usually gives smaller distortion in the 
reconstruction signal comparing to a uniform quantization 
scheme. Recently, optimal quantization strategy for the ran- 
dom measurements has been designed based on distributed 
functional scalar quantizers 1331 . In this paper, we use a 
simple quantization strategy for realistic compression along 
with consistent prediction constraints in the joint decoding of 
correlated images in order to illustrate the potential of low 
complexity sensing solutions in practical multi-view imaging 
applications. 

III. Framework 

We consider a pair of images Ii and I2 (with resolution 
N = Ni X N2) that represent a scene taken from different 
viewpoints; these images are correlated through motion of 
visual objects. The captured images are encoded independently 



and are transmitted to a joint decoder. The joint decoder esti- 
mates the relative transformations between the received signals 
and jointly decodes the images. While the description is given 
here for pairs of images, we later extend the framework to 
multiple images in Section [Vll 

We focus on the particular problem where one of the images 
serves as a reference for the correlation estimation and the 
decoding of second image as illustrated in Fig. [T] While the 
reference image h could be encoded with any compression al- 
gorithm (e.g., JPEG, compressed sensing framework [12J), we 
choose here to encode the reference image Ii with JPEG 2000 
coding solutions. Next, we concentrate on the independent 
coding and joint decoding of the second image, where the first 
image h serves as side information. The second image I2 is 
projected on a random matrix $ to generate the measurements 
1/2 = ^h- The measurements 1/2 are quantized with a uniform 
quantization algorithm and the quantized linear measurements 
are finally compressed with an entropy coder. 

At the decoder, we first estimate the prominent visual 
features that carry the geometry information of the objects 
in the scene. In particular, the decoder computes a sparse 
approximation of the image Ii using a parametric dictionary 
of geometric functions. Such an approximation captures the 
most prominent geometrical features in the image Ii . We then 
estimate the corresponding features in the second image I2 
directly from the quantized linear measurements ^2 without 
explicit image reconstruction. In particular, the corresponding 
features between images are related using a geometry-based 
correlation model, where the correspondences describe local 
geometric transformations between images. The correlation 
information is further used to decode the compressed image 
I2 from the reference image /i . We finally ensure a consistent 
prediction of I2 by explicitly considering the quantized mea- 
surements y2 during the warping process. Before getting into 
the details of the correlation estimation algorithm, we describe 
the sparse approximation algorithm and the geometry-based 
correlation model built on a parametric dictionary. 

We describe now the geometric correlation model that is 
based on matching the sparse geometric features in different 
images. We first compute a sparse approximation of the refer- 
ence image Ii using geometric basis functions in a structured 
dictionary V = {g^} where is called an atom. The 
dictionary V is typically constructed by applying geometric 
transformations (given by the unitary operator U{'y)) to a 
generating function g to form the atom g^. A geometric 
transformation indexed by 7 consists of a combination of 
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Fig. 2. Sample Gaussian atoms with mother function 
g(x,y) = -y=Qxp{—{x'^ +2/^)) that undergo different set of transformations. 



operators for anisotropic scale by Sx^Sy, rotation by 0, and 
translation by tx^ty. For example, when ^ is a Gaussian 
function g{x,y) = ^exp(— (x^ + ?/^)), the transformation 
is expressed as 



(a) 



(b) 



with gi 
and g2 - 



g^{x,y) = — exp(-(^i +^2)) 
cos{0){x — tx) + sin{0){y — ty) 

cos{6){y - ty) - sin{6){x - tx) 



(1) 



In Fig. [2] we illustrate Gaussian atoms for different translation, 
rotation and anisotropic scaling parameters. Now, we can 
write the linear approximation of the reference image /i with 
functions in V as 



K 



(2) 



k=l 



where {ck} represents the coefficient vector. The K number 
of atoms used in the approximation of Ii is usually much 
smaller than the dimension of the image /i. We use here a 
suboptimal solution based on matching pursuit ll34ll , ll35l in 
order to estimate the set of K atoms. 

The correlation between images can now be described by 
the geometric deformation of atoms in different images ll23l , 
II24II . Once the reference image Ii is approximated as given 
in Eq. (|2]), the second image I2 could be approximated with 
transformed versions of the atoms used in the approximation 
of Ii. We can thus approximate I2 as 



K 



k=l 



K 

k=l 



(3) 



where F^{g^^) represents a local geometrical transformation 
of the atom g^^ . Due to the parametric form of the dictionary 
it is interesting to note that the transformation on g^^ boils 
down to a transformation of the atom parameters, i.e., 

F^{9-fk) = U{S-f)g^^ =U{Sjojt,)g = gs^^^^ =gY^. (4) 

For clarity, we show in Fig. [3] a sample synthetic correlated 
image pair and their sparse approximations using atoms in the 
dictionary. We see that the sparse approximations of images 
can be described with the transform F of atom parameters. 

The true transformations {F^} however are unknown in 
practical distributed coding applications. Therefore, the main 
challenge in our framework consists in estimating the local 
geometrical transformations {F^} when the second image I2 
is only available in the form of quantized linear measurements 



Fig. 3. Illustration of the atom transform F in the approximation of the 
correlated images: (a) original correlated synthetic images; (b) sparse approx- 
imation of the images using atoms in the dictionary. The rectangle and square 
objects are related with transformations and respectively. 



IV. Correlation estimation from compressed 

LINEAR MEASUREMENTS 

A. Regularized optimization problem 

We describe now our optimization framework for estimating 
the correlation between images. Given the set of K atoms 
{^^^} that approximate the first image /i, the correlation 
estimation problem consists in finding the corresponding vi- 
sual patterns in the second image I2 that is given only by 
compressed random measurements ^2- This is equivalent to 
finding the correlation between images h and I2 with the joint 
sparsity model based on local geometrical transformations, as 
described in Section [Till 

In more details, we are looking for a set of K atoms in 
I2 that correspond to the K visual features {^7^} selected 
in the first image. We denote their parameters by A where 
A = (71,72,...,7k) for some 7^, yk,l < k < K. We 
propose to select this set of atoms {^'7^} in a regularized 
energy minimization framework as a trade-off between effi- 
cient approximation of I2 and smoothness or consistency of 
the local transformations between images. The energy model 
E proposed in our scheme is expressed as 



E{A)=Ed{A)^a,Es{A), 



(OPT-1) 



where Ed and Es represent the data and smoothness terms, 
respectively. The regularization parameter ai balances the 
importance of the data and smoothness terms. The solution 
to our correlation estimation is given by the set of K atom 
parameters A* that minimizes the energy E, i.e.. 



A* = ar^min£;(A). 
The parameter S represents the search space given by 



(5) 



S = {(71, 72, . . . , 7k) I 7^ = ^7 o 7ife, 1 < ^ < 5jeC}. 

(6) 

The multidimensional search window £ C is defined as 
>C = [Stx, Stx] X [Sty, 6ty] X [SO, SO] X [Ssx, Ssx] x 
[—Ssy, 6sy] where Stx, Sty, SO, Ssx, Ssy determine the window 
size for each of the atom parameters (i.e., translations txjty, 
rotation and scales Sx^Sy). Even if our formulation is able to 
handle complex transformations, they generally take the form 
of motion vectors or disparity information in video coding 
or stereo imaging applications. The label sets and the search 
space S are drastically reduced in this case. The terms used 
in OPT-1 are described in the next paragraphs. 
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B. Data cost function 

The data cost function computes (in the compressed do- 
main) the accuracy of the sparse approximation of the second 
image with geometric atoms Hnked to the reference image. 
The decoder receives the measurements y2 that are computed 
by the quantized projections of I2 onto a sensing matrix ^. 
For each set of K atom parameters A = {7^} the data term 
Ed reports the error between measurements ^2 and orthogonal 
projection of ^2 onto that is formed by the compressed 
versions of the atoms, i.e., = ^[g^[ \g^^ \ . . . \g^'^]. It turns 
out that the orthogonal projection of ^2 is given as ^a^a^2, 
where f represents the pseudo-inverse operator. More formally, 
the data cost is computed using the following relation: 

E,{A) = \\y2 - = 11^2 - ^AcWl (7) 

The data cost function given in Eq. (|7]) therefore first calculates 
the coefficients c = '^\y2 and then measures the distance 
between the observations ^2 and ^ac. In other words, the data 
cost function Ed accounts for the intensity variations between 
images by estimating the coefficients c of the warped atoms. 

When the measurements are quantized, the coefficient vec- 
tor c fails to properly account for the error introduced by 
the quantization. The quantized measurements only provide 
the index of the quantization interval containing the actual 
measurement value and the actual measurement value could 
be any point in the quantization interval. Let y2{i) be the 
i^^ coordinate of the original measurement and y2{i) be the 
corresponding quantized value. It can be noted that the joint 
decoder has only access to the quantized value y2{'i) and not 
the original value y2{i)- Henceforth, the joint decoder knows 
that the quantized measurement lies within the quantization 
interval, i.e., y2{i) e lly^i) = (r^ r^+i] where and r^+i 
define the lower and upper bounds of the quantizer bin Qi. 
We therefore propose to refine the data term in the presence of 
quantization by computing a coefficient vector c as the most 
consistent coefficient vector when considering all the possible 
measurement vectors that can result in the quantized measure- 
ments vector y2. In more details, the quantized measurements 
y2 can be produced by all the observation vectors y2 ^ 
where IZy is the Cartesian product of all the quantized regions 
Tl^(i) , i.e., IZy = Yl- Tly(^i^ . The data cost term given in Eq. ^ 
can thus be modified as 



^d(A) 



mm I 

c,y2 



^AC\\1 



S.t. y2^^y- 



(8) 



Therefore, the robust data term Ed{A) first jointly estimates 
the coefficients c and the measurements ^2, and then computes 
the distance between the ^2 and ^ac. It can be shown that the 
Hessian of the objective function /i(c, ^2) =|| ^2 — II2 in 
Eq. ([5]) is positive semidefinite, i.e., V^/i ^ 0, and hence the 
objective function h is convex. Also, the region IZy forms a 
closed convex set as each region 1Zy. = {vi r^+i], \/i forms 
a convex set. Henceforth, the optimization problem given in 
Eq. © is convex, which leads to effective solutions. 

C. Smoothness cost function 

The goal of the smoothness term Es in OPT-1 is to 
regularize the atom transformations such that the transforma- 



tions are coherent for neighbor atoms. In other words, the 
atoms in a spatial neighborhood are likely to undergo similar 
transformations, when the correlation between images is due 
to object or camera motion. Instead of penalizing directly 
the transformation {F^} to be coherent for neighbor atoms, 
we propose to generate a dense disparity (or motion) field 
from the atom transformations and to penalize the disparity 
(or motion) field such that it is coherent for adjacent pixels. 
This regularization is easier to handle than a regular set of 
transformations {F^} and directly corresponds to the physical 
constraints that explain the formation of correlated images. 

In more details, for a given transformation value 
h = if'x-'^x,t'y-ty,0' - e,Sx/s'^,Sy/s'y) at pixel z we 
compute the horizontal component and vertical component 
of the motion field as 



■ m^(z) " 




m(z) - tx 


m^(z) 




n(z) - ty 



SRT 



(9) 



where {m{z)^n{z)) represent the Euclidean coordinates. The 
matrices S, R and T represent the grid transformations due to 
scale, rotation and translation changes respectively. They are 
defined as 

cos{0' - 6) sin{e' - 6) 
-sin{e' - 6) coslo' - 6) 









Sy/Sy^ 



and 



m(z) 
n(z) - 



to. 



-it'. 



Finally, the smoothness cost Eg in OPT-1 is given as 
^,(A)= J2 



(10) 



where z, z' are the adjacent pixel locations and A/" is the usual 
4-pixel neighborhood. The term Vz^z' in Eq. (fTOl) captures the 
distance between local transformations in neighboring pixels. 
It is defined as 

Vz,z' = min (|m^(z) - m^(zO| + |m^(z) - m^(zO|,r) . 

(11) 

The parameter r in Eq. (fTTI) sets a maximum limit to the 
penalty; it helps to preserve the discontinuities in the transfor- 
mation field that exist at boundaries of visual objects II36I . 

D. Optimization algorithm 

We describe now the optimization methodology that is 
used solve OPT-1. Recall that our objective is to assign a 
transformation F to each atom in the reference image 
in order to build a set of smooth local transformations that 
is consistent with the quantized measurements y2. The can- 
didate transformations are chosen from a finite set of labels 
C = Cx X ^y X X X where Cx, ^y, ^e, and 
Cb refer to the label sets corresponding to translation along x 
and y directions, rotations and anisotropic scales respectively 
(see Eq. ©). One could use an exhaustive search on the entire 
label C to solve OPT-1. However, the cost for such a solution is 
high as the size of the label set £ grows exponentially with the 
size of the search windows 6tx, Sty, 60, 6sx, Ssy. Rather than 
doing an exhaustive search, we use graph-based minimization 
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Fig. 4. A graph Q = (V, S) is constructed using the set of vertices V = 
Z U jC, where the pixels nodes Z = {1,2,... ,A^} and label nodes C = 
{h^h, • • • ,lt}- Each pixel z is connected to the 1-node with a t-link. Some 
t-links are omitted for the sake of clarity. The pixels z, G A/" are connected 
with a n-link. The correlation solution is given a multiway cut that leaves 
each p-node connected with only one t-link f36l . 

techniques that converge to strong local minima or global 
minima in a polynomial time with tractable computational 
complexity Ell, O. 

Usually in Graph Cut algorithms a graph Q = (V,f) is 
constructed using set of vertices V and edges E. The set of 
vertices are given as V = Z U C, where Z define of nodes 
corresponding to the pixels in the images (p-nodes) and C 
define the label nodes (1-nodes), as shown in Fig. IH The p- 
nodes that are in the neighborhood A/" are connected by an 
edge called n-link. The cost of n-link usually corresponds to 
the penalty of assigning different labels to the adjacent pixels 
as given by 14,z'. Also, each p-vertex z is connected to the 
1-node by an edge called t-link. The cost of a t-link connecting 
a pixel and a label corresponds to the penalty of assigning the 
corresponding label to that pixel; this cost is normally derived 
from the data term. The final solution is given by a multi-way 
cut that leaves each p-vertex connected with exactly one t-link. 
For more details we refer the reader to (361 . 

In order to solve our OPT-1 problem, we first need to map 
our cost functions on the graph in order to assign weights 
to the n-links and t-links. For a given pair of transformation 
labels at pixels z and z\ it is straightforward to calculate the 
weights of the n-links using Eq. (fTTl) . It should be noted that 
the motion field for a given label is computed using Eq. Q. 
We now describe how to calculate the cost of the t-links based 
on the data cost Ed{A). Let Zj^ be the set of pixels in the 
support of the atom g^^ that is given as 

Zk = {z = {x,y)\g^^{x,y) > e}, (12) 

where e > is a constant. Using this definition, we calculate 
the t-link penalty cost of connecting a label node //e G £ to all 
the pixel nodes z in the support of the atom as Ed{A) given 
in Eq. Q), where A = (71, 72, . . . , //c o 7/,, . . . , 7^). That 
is, the t-link cost computed between the label Ik and pixels 



z,Vz G Zk is Ed{A). However, due to atom overlapping the 
pixels in the overlapping region could be assigned more than 
one label. In such cases, we compute the cost corresponding to 
the index k' of the atom that has the maximum atom response. 
The index k' is computed as 

k' = arq max (13) 

where is the response of the k^^ atom at the location 
z, i.e., = ^7fc(z) = g^^{x^y). After mapping the cost 
functions on the graph we calculate the correlation solution 
using a max-flow/min-cut algorithm |36|. Finally, the data term 
Ed in OPT-1 can be replaced with the robust data term Ed 
given in Eq. ([5]) in order to provide robustness to quantization 
errors. The resulting optimization problem can be efficiently 
solved using Graph Cut algorithms as described above. 

E. Complexity considerations 

We discuss now briefly the computational complexity of 
our correlation estimation algorithm which can basically be di- 
vided into two stages. The first stage finds the most prominent 
features in the reference image using sparse approximations 
in a structured dictionary. The second stage estimates the 
transformation for all the features in the reference image by 
solving a regularized optimization problem OPT-1. 

Overall, our framework offers a very simple encoding stage 
with image acquisition based on random linear projections. 
The computational burden is shifted to the joint decoder which 
can still trade-off complexity and performance. Even if the 
decoder is able to handle computationally complex tasks in 
our framework, the complexity of our system stays reasonable 
due to the efficiency of Graph Cuts algorithms whose com- 
plexity is bounded by a low order polynomial |36l, f3T|. The 
complexity can be further reduced in both stages compared to 
the generic implementation proposed above. For example, the 
complexity of the sparse approximation of the reference image 
can be reduced significantly using a tree- structured dictionary, 
without significant loss in the approximation performance ll38ll . 
In addition, a block-based dictionary can be used in order 
to reduce the complexity of the transformation estimation 
problem with block-based computations. Experiments show 
however that this comes at a price of a performance penalty in 
the reconstruction quality. Overall, it is clear that the decoding 
scheme proposed above offers high flexibility with an inter- 
esting trade-off between the complexity and the performance. 
For example, one might decide to use the simple data cost 
Ed even when the measurements are quantized; it leads to a 
simpler scheme but to a reduced reconstruction quality. 

V. Consistent image prediction by warping 

After correlation estimation, one can simply reconstruct an 
approximate version of the second image I2 by warping the 
reference image Ii using a set of local transformations that 
forms the warping operator Wa (see Fig. [T]). The resulting 
approximation is however not necessarily consistent with the 
quantized measurements ^2 ; the measurements corresponding 
to the projection of the image I2 on the sensing matrix ^ are 
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not necessarily equal to y2- The consistency error might be 
significant, because the atoms used to compute the correlation 
and the warping operator do not optimally handle the texture 
information. 

We therefore propose to add a consistency term Et in 
the energy model of OPT-1 and to form a new optimization 
problem for improved image prediction. The consistency term 
forces the image reconstruction through the warping operator 
to be consistent with the quantized measurements. We define 
this additional term Et as the I2 norm error between the quan- 
tized measurements generated from the reconstructed image 
I2 = Wa(A) and the measurements ^2. The consistency term 
Et is written as 

Et{A) = Wm - Qimh = Wm - q[$>Va(/i)]||2, (14) 

where Q is the quantization operator. In the absence of 
quantization the consistency term simply reads as 

^,(A) = ||^2-^Wa(/i)||2. (15) 

We then merge the three cost functions E^, Eg and Et with 
regularization constants ai and 0^2 in order to form a new 
energy model Eji for consistent reconstruction. It is given as 

En{A) = Ed{A) + aiE,{A) + a2^,(A). (OPT-2) 

We now highlight the differences between the terms E^ and 
Et used in OPT-2. The data cost E^ adapts the coefficient 
vector to consider the intensity variations between images 
but it fails to properly handle the texture information. On 
the other hand, the consistency term Et warps the atoms by 
considering the texture information in the reconstructed image 
Ii but it fails to carefully deal with the intensity variations 
between images. These two terms therefore impose different 
constraints on the atom selection that effectively reduce the 
search space. We have observed experimentally that the quality 
of the predicted image I2 is maximized when all three terms 
are activated in the OPT-2 optimization problem. 

We propose to use the optimization method based on Graph 
Cuts described in Section IIV-DI in order to solve OPT-2. In 
particular, we map the consistency cost Et into the graph 
(see Fig. [4]) in addition to the data cost E^ and smoothness 
cost Es. For a given A = (71, 72, . . . , o 7fe, • • • , Ik), we 
propose to compute the t-link cost of connecting the label 
Ik ^ £ to the pixels z,Vz G as a cumulative sum of 
Ed{A) + 0^2 (A). In the overlapping regions, as described 
earlier we take the value corresponding to the atom index k' 
that has maximum response as given in Eq. ([T3]) . Then, the 
n-link weights for the adjacent pixels z and z' are computed 
based on Eq. (fTTI) . After mapping the cost functions on the 
graph the correlation solution is finally estimated using max- 
flow/min-cut algorithms ll36l . Finally, the data cost E^ in 
OPT-2 can be again replaced by the robust data term E^ 
given in Eq. ([5]). We show later that the performance of our 
scheme improves by using the robust data term Ed in the 
presence of quantization. At last, the complexity of estimating 
the correlation model with OPT-2 problem is tractable, thanks 
to the efficiency of Graph Cuts algorithms ll36l . ll37l . 



VI. Correlation estimation of multiple image sets 

So far, we have focused on the distributed representation 
of image pairs. In this section, we describe the extension 
of our framework to the datasets with J correlated images 
denoted as /i, /2, . . . , /j. Similar to the stereo setup, we 
consider Ii as the reference image. This image is given in 
a compressed form Ii and its prominent features are extracted 
at decoder with a sparse approximation over the dictionary 
V (see Section IIV-AI) . The images /2,...,/j are sensed 
independently using the measurement matrix <l> and their 
respective measurements ?/2 , • • • , ^ j are quantized and entropy 
coded. Our framework can be applied to image sequences or 
multi-view imaging. For the sake of clarity, we focus on a 
multi-view imaging framework where the multiple images are 
captured from different viewpoints. 

We are interested in estimating a depth map Z that captures 
the correlation among J images by assuming that the camera 
parameters are given a priori. The depth map is constructed 
using the set of K features {^7^} in the reference image and 
the quantized measurements ^2, • • • , ^j- We assume that the 
depth values Z are discretized such that the inverse depth 1/Z 
is uniformly sampled in the range [l/ZmaxA/^min] where 
Zjnin and Zjnax are the minimal and maximal depth in the 
scene, respectively [39]. The problem is equivalent to finding 
a set of labels / G C that effectively captures the depth 
information for each atom or pixel z in the reference 
image, where £ is a discrete set of labels corresponding to 
different depths. We propose to estimate the depth information 
with an energy minimization problem OPT- 3 which includes 
three cost functions as follows: 

H{A) = Hd{A) + aiHs{A) + a2Ht{A), (OPT-3) 

where Hd^Hs 3.ndHt represent the data, smoothness and 
consistency terms respectively. These three terms are balanced 
with regularization constants ai and 0^2. 

The data term Hd assigns a set of labels h^h^ - - - jIk 
respectively to the K atoms g^^^ g^^^ . . . , g^^ while respecting 
consistency with the quantized measurements. It reads as 

J 2 

where = ^[Pj{g^,,h),Vj{g^^,l2), ^ ^ ^ ,Vj{g^,,,lk), - ^ , 
Vj {g^j^ ^Ik)]- The operator Vj {g^^ , /) represents the projection 
of the atom g^^ to the j^^ view when the local transformation 
is given by the depth label / (see Fig. O. It can be noted 
that the data term in Eq. ([T6l) is similar to the data term 
described earlier for image pairs (see Eq. ©) except that 
the sum is computed for all the views. Depending on the 
relative position of the j^^ camera with respect to the reference 
camera, the projection Vj{g^^^l) can involve changes in the 
translation, rotation or scaling parameter, or combinations of 
them. Therefore, the projection Vj{g^j^^l) of the atom g^^ to 
the j^^ view approximately corresponds to another atom in 
the dictionary V. It is interesting to note that the data cost is 
minimal if the projection of the atom g^^ onto another view 
corresponds to its actual position in this vie This happens 

^we assume here that we have no occlusions. 
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Fig. 5. Illustration of the atom interactions in the multi-view imaging scenario. 
The original position of the features in all the images is marked in black color. 
Projection of the first feature g^-^ at ^ = 2 in the views I2 and I3 corresponds 
to the actual position of the feature in the respective views and thus forms a 
valid 3D region at ^ = 2. Meanwhile, the projection of the second feature 
at / = 4 corresponds to the actual position only in view Is but not in view 
I2 (highlighted in red color). Hence, the second feature does not intersect at 
/ = 4 which results in suboptimal solution at ^ = 4. 



when the depth label / corresponds to the true distance to 
the visual object represented by the atom . For example, 
the projection of the feature in Fig. [5] corresponds to the 
actual position of the features in views I2 and Is. Therefore, 
the data cost for this feature at label 1=2 is minimal. On 
the other hand, the projection of the feature g^^ is far from 
the actual position of the corresponding feature in the view 

I2. The corresponding data cost \\y2 — ^a^a^^2||2 is high in 
this case which indicates a suboptimal estimation of the depth 
label /. 

The smoothness cost Hs enforces consistency in the depth 
label for the adjacent pixels z and z^ It is given as 



E 



min(|Z(z)-Z(zO|,r), 



(17) 



where r is a constant and A/" represents the usual 4-pixel 
neighborhood. Finally, the consistency term Hf favors depth 
labels that lead to image predictions that are consistent with 
the quantized measurements. We compute the consistency for 
all the views as the cumulative sum of terms Et given in 
Eq. (O. More formally, the consistency term Hf in the multi- 
view scenario is computed as 



Htiih}) 



El 

J=2 
J 

El 

i=2 



|^,-Q[W(/i,{/,})]|| 



(18) 



2' 



where {Ii^ {Ik}) predicts the j*^ view using the set of 
labels {Ik} and the set of K atoms {^7^}. Finally, the OPT- 
3 optimization problem can be solved in polynomial time 
using the graph-based optimization methodologies described 



in Section IIV-DI In this case, the weights to the t-links 
connecting between the label Ik and the pixels z, Vz G Zk are 
assigned as -\- a2Ht. The n-link cost for the neighboring 
pixels z,z' G A/" is assigned as min(|Z(z) — Z(z')|,r). 

VII. Experimental results 

A. Setup 

In this section, we report the performance of the correla- 
tion estimation algorithms in stereo and multi-view imaging 
frameworks. In order to compute a sparse approximation of 
the reference image at decoder, we use a dictionary V that 
is constructed using two generating functions, as explained 
in |35|. The first one consists of 2D Gaussian functions in 
order to capture the low frequency components (see Fig. [2]). 
The second function represents a Gaussian function in one 
direction and the second derivative of a Gaussian in the 
orthogonal direction in order to capture the edges. The discrete 
parameters of the functions in the dictionary are chosen as 
follows. The translation parameters tx and ty take any positive 
value and cover the full height Ni and width N2 of the 
image. Ten rotation parameters are used between and tt 
with increments of 7r/18. Five scaling parameters are equi- 
distributed in the logarithmic scale from 1 to A/^i /8 vertically, 
and 1 to N2 1^.11 horizontally. The image I2 is captured by 
random linear projections using a scrambled block Hadamard 
transform with a block size of 8 IIT2I . The measurements 7/2 
are quantized using a uniform quantizer and the bit rate is 
computed by encoding the quantized measurements using an 
arithmetic coder. Unless stated differently, the parameters ai 
and a2 in the optimization problems are selected based on trial 
and error experiments such that the estimated transformation 
field maximizes the quality of the predicted image I2. 

B. Generic transformation 

We first study the performance of our scheme with a pair 
of synthetic images that contains three objects. The original 
images /i and I2 are given in Fig. [SJa) and Fig. [6tb) respec- 
tively. It is clear that the common objects in the images have 
different positions and scales. The absolute error between the 
original images is given in Fig.[6tc), where the PSNR between 
1 1 and I2 is found to be 15.6 dB. 

We encode the reference image /i to a quality of 35dB 
and the number of features used for the approximation of 
Ii is set to K = 15. The transformation field is estimated 
with Stx = Sty = 3 pixels, Ssx = Ssy = 2 samples 
and 60 = 0. We first estimate the transformation field with 
the OPT-1 problem by setting ai = 0, i.e., the smoothness 
term Es is not activated. The resulting motion field is shown 
in Fig. [6td). From Fig. [6td) we observe that the proposed 
scheme gives a good estimation of the transformation field 
even with a 5% measurement rate that are quantized with 2 
bits. We further see that the image I2 predicted with help of 
the estimated correlation information is closer to the original 
image I2 than to h (see Fig. [6te)). We then include the 
consistency term in addition to the data cost and we solve the 
problem OPT-2 without activating the smoothness term, i.e., 
ai = 0. The estimated transformation field and the prediction 
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Es = 4851 Es = 1479 

Fig. 6. Comparison of the estimated motion fields and the predicted images with the OPT-1 and OPT-2 problems in the synthetic scene. The motion field is 
estimated using a measurement rate of 5% with a 2-bit quantization, (a) Original image /i; (b) original image I2', (c) absolute error between Ii and I2', (d) 
motion field estimated with OPT-1 without activating Eg, i.e., ai = 0; (e) prediction error with OPT-1 when motion field in (d) is used for image prediction; 
(f) motion field estimated with OPT-2 without activating Eg ; (e) prediction error with OPT-2 when motion field in (f) is used for image prediction; (h) motion 
field estimated with OPT-2; (i) prediction error with OPT-2 when motion field in (h) is used for image prediction. The smoothness energy Eg of the motion 
fields are (d) 4309 (f) 4851 and (h) 1479. The PSNR of the predicted images 12 in (e), (g) and (i) w.r.t. h are 20 dB, 20.4 dB and 21.53 dB respectively. 



error are shown in Fig. Of) and Fig. Og), respectively. We 
observe that the consistency term improves the quaUty of the 
motion field and the prediction quality. Finally, we highlight 
the benefit of enforcing smoothness constraint in our OPT-2 
problem. The estimated transformation field with the OPT-2 
problem including the smoothness term is shown in Fig. Oh). 
By comparing the motion fields in Fig. Od) and Fig. Of) we 
see that the motion field in Fig. Oh) is smoother and more 
coherent; this confirms the benefit of the smoothness term. 
Quantitatively, the smoothness energy Es of the motion field 
shown in Fig. Oh) is 1479, which is clearly smaller comparing 
to the solutions given Fig. Od) and Fig. Of) (resp. 4309 and 
4851). Also, the smoothness term effectively improves the 
quality of the predicted image and the predicted image I2 gets 
closer to the original image I2 as shown in Fig. OO- 

C. Stereo image coding 

We now study the performance of our distributed image 
representation algorithms in stereo imaging frameworks. We 
use two datasets, namely Plastic and Sawtooth The images 
are downsampled to a resolution Ni = 144, N2 — 176 
(original resolution of the datasets are 370 x 423 and 434 x 380 
respectively). We carry out experiments using the views 1 and 
3 for the Plastic dataset and views 1 and 5 for the Sawtooth 
dataset. These datasets have been captured by a camera array 
where the different viewpoints are uniformly arranged on a 
line. As this corresponds to translating the camera along one of 
the image coordinate axis, the disparity estimation problem be- 
comes a one-dimensional search problem and the smoothness 

^These image sets are available at |http://vision. middlebury.edu/stereo/data/ 1 



term in Eq. (fTOl) is simplified accordingly. The viewpoint 1 is 
selected as the reference image 1\ and it is encoded such that 
the quality of 1\ is approximately 33 dB. Matching pursuit is 
then performed on /i with K = 30 and i^T = 60 atoms for the 
Plastic and Sawtooth datasets respectively. The measurements 
on the second image are quantized using a 2-bit quantizer. 
At the decoder, the search for the geometric transformations 
{F^} is carried out along the translational component with 
window size bt^ — 4 pixels and no search is consider along 
the vertical direction, i.e., ^ty = 0. Unless stated explicitly, 
we use the data cost E^ given in Eq. (|7]) in the OPT-1 and 
OPT-2 problems. 

We first study the accuracy of the estimated disparity 
information. In Fig. [7] we show the estimated disparity field 
from 8870 quantized measurements (i.e., a measurement 
rate of 35%) for the Plastic dataset. The groundtruth is 
given in Fig. Ha). The transformation is estimated by solving 
OPT-1 and the resulting dense disparity field is illustrated 
in Fig. Hb). In this particular experiment, the parameter 
Oi\ is selected such that the error in the disparity map is 
minimized. The disparity error DE is computed between 
the estimated disparity field and the groundtruth 
^ DE = ivr^ Ez=(.,,) {IM^z) - m'^(z)| > l} where 
Ni X N2 represents the pixel resolution of the image ESl- From 
Fig- Etb) we observe that OPT-1 gives a good estimation of 
the disparity map; in particular the disparity value is correctly 
estimated in the regions with texture or depth discontinuities. 
We could also observe that the estimation of the disparity field 
is however less precise in smooth regions as expected from 
feature-based methods. Fortunately, the wrong estimation of 
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(a) (b) (c) |M^ - m^| > 1 (d) (e) |M^ - m^| > 1 

Fig. 7. Comparison of the estimated disparity fields with OPT-1 and OPT-2 for the Plastic dataset: (a) groundtruth disparity field between views 1 and 
2; (b) estimated disparity field with OPT-1; (c) error in the disparity map with OPT-1 (DE = 10.8%); (d) estimated disparity field with OPT-2; (d) error in 
the disparity map with OPT-2 (DE = 4.1%). The disparity field is estimated using a measurement rate of 35% with a 2-bit quantization. 




the disparity value corresponding to the smooth region in the 
images does not significantly affect the warped or predicted 
image quality |25|. Fig. Oc) confirms such a distribution of 
the disparity estimation error where the white pixels denote an 
estimation error larger than one. We can see that the error in 
the disparity field is highly concentrated along the edges, since 
crisp discontinuities cannot be accurately captured due to the 
scale and smoothness of the atoms in the chosen dictionary. 
The disparity information estimated by OPT-2 is presented in 
Fig- Eld) and the corresponding error is shown in Fig. He). In 
this case, the regularization constants ai and 0^2 in the OPT-2 
problem are selected such that the DE is minimized. We see 
that the addition of the consistency term Et in the correlation 
estimation algorithm improves the performance. 

We then study the rate-distortion (RD) performance of the 
proposed algorithms in the prediction of the image I2 in Fig. 
m We show the performance of the reconstruction by warping 
the reference image according to the correlation computed by 
OPT-1 and OPT-2. We then highfight the benefit of using the 
robust data term in OPT-2 problem (denoted as OPT-2 
(Robust)). We use the optimization toolbox based on CVX |40| 
in order to solve the optimization problem given in Eq. ([5]). 
We then compare the RD performance to a distributed coding 
solution (DSC) based on the LDPC encoding of DCT coef- 
ficients, where the disparity field is estimated at the decoder 
using Expected Maximization (EM) principles 1 6 1 (denoted as 



Disparity learning). Then, in order to demonstrate the benefit 
of geometric dictionaries we propose a scheme denoted as 
block-based that adaptively constructs the dictionary using 
blocks or patches in the reference image |5|. We construct 
a dictionary in the joint decoder from the reference image 
1 1 segmented into 8x8 blocks. The search window size is 
5tx = 4 pixels along the horizontal direction. We then use 
the optimization scheme described in OPT-2 to select the best 
block from the adaptive dictionary. In order to have a fair 
comparison, we encode the reference image Ii similarly for 
both schemes {Disparity learning and block-based) with a 
quality of 33 dB (see Section |IIl|. Finally, we also provide 
the performance of a standard JPEG 2000 independent en- 
coding of the image I2. From Fig. [H we first see that the 
measurement consistency term Et significantly improves the 
decoding quality, as OPT-2 gives better performance than OPT- 
1. We further see that the OPT-2 problem with robust data 
cost improves the quality of the reconstructed image I2 by 
0.5-1 dB at low bit rates. Then, the results confirm that the 
proposed algorithms unsurprisingly outperform independent 
coding based on JPEG 2000, which outlines the benefits of 
the use of correlation in the decoding of compressed correlated 
images. At high rate, the performance of the proposed algo- 
rithms however tends to saturate as our model mostly handles 
the geometry and the correlation between images; but it is 
not able to efficiently handle the fine details or texture in 
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Fig. 9. RD performance with OPT-2 for decoding I2 (view 5) as a function 
of the quality of the reference image /i (resp. 28 dB, 33 dB and 38 dB) in 
the Sawtooth dataset. 




JPEG 2000 

Proposed: rate 0.2 
Proposed: rate 0.3 
Proposed: rate 0.4 
Proposed: rate 0.5 
Proposed: rate 0.75 
Proposed: rate 1.5 



Total bitrate (bpp) 



Fig. 10. Overall RD performance between views 1 and 5 of the Sawtooth 
dataset. OPT-2 is used to predict the image I2 (view 5) using the image /i 
(view 1) as the reference image. The image at view 5 is predicted with varying 
reference image bit rates 0.1, 0.2, 0.3, 0.4, 0.5, 0.75 and 1.5. 



the scene due to the image decoding 12 based on warping. 
From Fig. [8l it is then clear that the reconstruction of image 
I2 based on OPT-1 and OPT-2 outperforms the DSC coding 
scheme based on EM principles due to the accurate correlation 
estimation. It is worth mentioning that state-of-the-art DSC 
scheme based on disparity learning compensate also for the 
prediction error in addition to correlation estimation. Though 
this is the case, our scheme outperforms DSC scheme due to 
an accurate disparity field estimation. Finally, the experimental 
results also show that our schemes outperform the scheme 
based on block-based dictionary mainly because of the richer 
representation of the geometry and local transformations with 
the structured dictionaries. 

We then study the influence of the quality of reference 
image /i on the reconstruction performance. We use OPT-2 
to reconstruct I2 (viewpoint 5) by warping when the reference 
image has been encoded at different qualities (i.e., different bit 
rates). Fig. [9] shows that the reconstruction quality I2 improves 
with the quality of the reference image Ii as expected. While 
we have observed that the error in the disparity estimation 
is not dramatically reduced by improved reference quality, 
the warping stage permits to provide more details in the 
representation of I2 when the reference is of better quality. 



Finally, we study the overall RD performance for the Sawtooth 
dataset between views 1 and 5 that also includes the bit 
rate and quality of the reference image, in addition to the 
rate and quality of image I2. Fig. [TO] shows the overall RD 
performance at reference image bit rates 0.2, 0.3, 0.4, 0.5, 0.75 
and 1.5 bpp. In our experiments, for a given reference image 
quality we estimate the correlation model using OPT-2 (with 
2-bit quantized measurements), and we compute the overall 
RD performance at that specific reference image bit rate. As 
shown before, the RD performance improves with increasing 
reference image quality. When we take the convex hull of 
the RD performances (which corresponds to implementing a 
proper rate allocation strategy), we outperform independent 
coding solutions based on JPEG 2000. 

We now study the influence of the quantization bit rate 
on the RD performance of I2 with the OPT-2 optimization 
scheme. We compress the measurements y2 using 2-, 4- and 
6-bits uniform quantizers. As expected, the quality of the cor- 
relation estimation degrades when the number of bits reduces 
as shown in Fig. Ela). However, it is largely compensated by 
the reduction in bit rate in the RD performance as confirmed by 
Fig.fTlTb). This means that the proposed correlation estimation 
is relatively robust to quantization so that it is possible to 
attain good RD performance by drastic quantization of the 
measurements. Finally, we study the improvement offered by 
the robust data term Ed (see Eq. ([5])) in OPT-2, when the 
measurements have been compressed with a 2-bit uniform 
quantizer. From Fig. Ela) it is clear that the proposed robust 
data term improves the performance due to the efficient 
handling of noise in the quantized measurements. 

D. Multi-view image representation 

We finally evaluate the performance of our multi-view 
correlation estimation algorithms using five images from the 
Tsukuha dataset (center, left, right, bottom and top views), and 
five frames (frames 3-7) from the Flower Garden sequence 
|39|. These datasets are down-sampled by a factor 2 and the 
resolution used in our experiments are of 144 x 192 and 
120 X 180 pixels respectively. In both datasets, the refer- 
ence image Ii (center view and frame 5 resp.) is encoded 
with a quality of approximately 33 dB. The measurements 
Vj^j G {1, 2, 3, 4} computed from the remaining four images 
are quantized using a 2 -bit quantizer. We first compare our 
results to a stereo setup where the disparity information is 
estimated with OPT-2 between the center and left images in 
Tsukuba dataset. Fig. [12] compares the inverse depth error 
(sum of the labels with error larger than one with respect 
to groundtruth) between the multi-view and stereo scenarios. 
In this particular experiment, the parameters ai and 0^2 are 
selected such that they minimize the error in the depth image 
with respect to the groundtruth. It is clear from the plot that 
the depth error is small for a given measurement rate when all 
the views are available. It should be noted that the x-axis in 
Fig. [12] represents the measurement rate per view. Hence, the 
total number of measurements used in the multi-view scenario 
is higher than for the stereo case. However, these experiments 
show that the proposed multi-view scheme gives a better depth 
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Fig. 12. Inverse depth error at various measurement rates of the Tsukuba multi- 
view dataset. OPT-2 and OPT-3 problems are used to estimate the depth in 
stereo and multi-view scenarios respectively. The measurements are quantized 
using a 2-bit quantizer. 



Fig. 13. Comparison of the overall RD performances between the proposed 
OPT-3 scheme, joint encoding scheme and independent coding scheme based 
on JPEG 2000. Bit rate of the reference image h is not included in the total 
bit budget. 



image when more images are available. Similar experimental 
findings have been observed for the Flower Garden sequence. 

We then study the RD performance of the proposed multi- 
view scheme in the decoding of four images (top, left, right, 
bottom images in the Tsukuba and frames 3,4,6,7 in the 
Flower Garden). The images are decoded by warping the 
reference image h using the estimated depth image. Fig. [13] 
compares the overall RD performance (for 4 images) of 
our multi-view scheme with respect to independent coding 
performance based on JPEG 2000. As expected, the proposed 
multi-view scheme outperforms independent coding solutions 
based on JPEG 2000 as it benefits from the correlation between 
images. Furthermore, as observed in distributed stereo coding 
the proposed multi-view coding scheme saturates at high 
rates, as the warping operator captures only the geometry and 
correlation between images and not the texture information. 

Finally, we compare our results with a joint encoding 
approach where the depth image is estimated from the original 
images and transmitted to the joint decoder. At the decoder, 
the views are predicted from the reconstructed reference image 
Ii and the compressed depth image with the help of view 
prediction. The results are presented in Fig. [13] (denoted as 



Joint Encoding), where the bit rate is computed only on the 
depth image encoded using a JPEG 2000 coding solution. 
The main difference between the proposed and joint encoding 
frameworks is that the quantized linear measurements are 
transmitted for a depth estimation in the former scheme, 
while the depth information is directly transmitted in the latter 
scheme. Therefore, by comparing these two approaches we 
can judge the accuracy of the estimated correlation model 
or equivalently the quality of the predicted view at a given 
bit rate. From Fig. [13] we see that at low bit rate < 0.2, 
the proposed scheme estimates a better structural informa- 
tion compared to the joint encoding scheme, thanks to the 
geometry-based correlation representation. However at rates 
above 0.2, we see that our scheme becomes comparable with 
joint coding solutions. This leads to the conclusion that the 
proposed scheme effectively estimates the depth information 
from the highly compressed quantized measurements. It should 
be noted that in joint encoding framework the depth images are 
estimated at a central encoder. In contrary to this, we estimate 
the depth images at the central decoder from the independently 
compressed visual information; this advantageously reduces 
the complexity at the encoder which makes it attractive for 
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distributed processing applications. 

VIII. Conclusions 

In this paper, we have presented a novel framework for the 
distributed representation of correlated images with quantized 
linear measurements, along with joint decoding algorithms that 
exploit the geometrical correlation among multiple images. We 
have proposed a regularized optimization problem in order to 
identify the geometrical transformations between compressed 
images, which result in smooth disparity or depth fields 
between a reference and one or more predicted image(s). We 
have proposed a low complexity algorithm for the correlation 
estimation problem which offers an effective trade-off between 
the complexity and accuracy of the solution. In addition, we 
have proposed a new consistency criteria such that transfor- 
mations are consistent with the compressed measurements in 
the predicted image. Experimental results demonstrate that the 
proposed methodology provides a good estimation of dense 
disparity/depth fields in different multi-view image datasets. 
We also show that our geometry-based correlation model is 
more efficient than block-based correlation models. Finally, the 
consistent constraints prove to offer effective decoding quality 
such that the proposed algorithm outperforms JPEG 2000 
and DSC schemes in terms of rate-distortion performance, 
even if the images are reconstructed by warping. This clearly 
positions our scheme as an effective solution for distributed 
image processing with low encoding complexity. 
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